Through the Looking Glass
November 10, 2024
by Gérard Biau and Erwan Scornet [1]
Regression
nodesize points or if all \(X_i\) in the node are identical.Classification
| Total Orders | Closed Short | Fulfilled | |
|---|---|---|---|
| (n=7585) | (n=733) | (n=6852) | |
| Top Customers | |||
| Smoothie Island | 1701 (22.43%) | 455 (62.07%) | 1246 (18.18%) |
| Philly Bite | 1556 (20.51%) | 267 (36.43%) | 1289 (18.81%) |
| PlatePioneers | 1396 (18.40%) | 143 (19.51%) | 1253 (18.29%) |
| Berl Company | 906 (11.94%) | 5 (0.68%) | 901 (13.15%) |
| DineLink Intl | 589 (7.77%) | 42 (5.73%) | 547 (7.98%) |
| Top Products | |||
| DC-01 | 1135 (14.96%) | 345 (47.07%) | 790 (11.53%) |
| TSC-PQB-01 | 1087 (14.33%) | 389 (53.07%) | 698 (10.19%) |
| TSC-PW14X16-01 | 848 (11.18%) | 283 (38.61%) | 565 (8.25%) |
| CMI-PCK-01 | 802 (10.57%) | 288 (39.29%) | 514 (7.50%) |
| PC-05-B1 | 745 (9.82%) | 220 (30.01%) | 525 (7.66%) |
| Top Distributors | |||
| Ed Don & Company - Miramar | 210 (2.77%) | 0 (0.00%) | 210 (3.06%) |
| PFG- Gainesville | 197 (2.60%) | 0 (0.00%) | 197 (2.88%) |
| Ed Don & Company - Woodridge | 186 (2.45%) | 0 (0.00%) | 186 (2.71%) |
| Ed Don & Company - Mira Loma | 180 (2.37%) | 0 (0.00%) | 180 (2.63%) |
| .Ed Don - Miramar | 162 (2.14%) | 0 (0.00%) | 162 (2.36%) |
| Top Substrates | Paper | Plastic | Bagasse |
| Revenue($103,826,286) | $54,838,585 (52.82%) | $40,336,669 (38.85%) | $4,350,337 (4.19%) |
| Quantity Ordered | Min | Mean | Max |
| Total Ordered(1,971,237) | 1 | 61.47 | 23,160 |
| Unit Price | Min | Mean | Max |
| Key Stats | $0.16 | $62.60 | $864.00 |
| Total Price | Min | Mean | Max |
| Key Stats | $4.92 | $3,430.74 | $143,084.74 |
Predicting Customer Churn
Random Forest Model Summary
SalesOrderStatus (Fulfilled vs. Unfulfilled) using 100 trees and mtry = 2.Model Performance Metrics
Conclusions
UnitPrice and Product were the most significant predictors for classification.SalesOrderStatus (e.g., “Fulfilled” vs. “Closed Short”) and the actual status is very low beyond what could be expected by random guessing.Random Forest Model Summary
QuantityFulfilled using 100 records of sales data.Model Performance Metrics
Conclusions
QuantityFulfilled with an average error of about 28 units (RMSE).qtyOrdered, TotalPrice) are substantially more important than categorical ones.